{ "cells": [ { "cell_type": "markdown", "id": "2e35e4f7", "metadata": {}, "source": [ "(xarray-intro)=\n", "# Xarray - brief introduction\n", "\n", "```{seealso}\n", "The complete source code of this tutorial can be found in\n", "\n", "{nb-download}`Xarray introduction.ipynb`\n", "```\n", "\n", "The Quantify dataset is based on {doc}`Xarray `.\n", "This subsection is a very brief overview of some concepts and functionalities of xarray.\n", "Here we use only pure xarray concepts and terminology.\n", "\n", "This is not intended as an extensive introduction to xarray.\n", "Please consult the {doc}`xarray documentation ` if you never used it\n", "before (it has very neat features!)." ] }, { "cell_type": "code", "execution_count": null, "id": "fbeea621", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "import numpy as np\n", "import xarray as xr\n", "from rich import pretty\n", "\n", "pretty.install()" ] }, { "cell_type": "markdown", "id": "c8bd036f", "metadata": {}, "source": [ "There are different ways to create a new xarray dataset.\n", "Below we exemplify a few of them to showcase specific functionalities.\n", "\n", "An xarray dataset has **Dimensions** and **Variables**. Variables \"lie\" along at least\n", "one dimension:" ] }, { "cell_type": "code", "execution_count": null, "id": "63e688c4", "metadata": {}, "outputs": [], "source": [ "n = 5\n", "\n", "values_pos = np.linspace(-5, 5, n)\n", "dimensions_pos = (\"position_x\",)\n", "# the \"unit\" and \"long_name\" are a convention for automatic plotting\n", "attrs_pos = dict(unit=\"m\", long_name=\"Position\") # attributes of this data variable\n", "\n", "values_vel = np.linspace(0, 10, n)\n", "dimensions_vel = (\"velocity_x\",)\n", "attrs_vel = dict(unit=\"m/s\", long_name=\"Velocity\")\n", "\n", "data_vars = dict(\n", " position=(dimensions_pos, values_pos, attrs_pos),\n", " velocity=(dimensions_vel, values_vel, attrs_vel),\n", ")\n", "\n", "dataset_attrs = dict(my_attribute_name=\"some meta information\")\n", "\n", "dataset = xr.Dataset(\n", " data_vars=data_vars,\n", " attrs=dataset_attrs,\n", ") # dataset attributes\n", "dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "dc96be76", "metadata": {}, "outputs": [], "source": [ "dataset.dims" ] }, { "cell_type": "code", "execution_count": null, "id": "f4b62946", "metadata": {}, "outputs": [], "source": [ "dataset.variables" ] }, { "cell_type": "markdown", "id": "4898cb49", "metadata": {}, "source": [ "A variable can be \"promoted\" to (or defined as) a **Coordinate** for its dimension(s):" ] }, { "cell_type": "code", "execution_count": null, "id": "a180b23c", "metadata": {}, "outputs": [], "source": [ "values_vel = 1 + values_pos**2\n", "data_vars = dict(\n", " position=(dimensions_pos, values_pos, attrs_pos),\n", " # now the velocity array \"lies\" along the same dimension as the position array\n", " velocity=(dimensions_pos, values_vel, attrs_vel),\n", ")\n", "dataset = xr.Dataset(\n", " data_vars=data_vars,\n", " # NB We could set \"position\" as a coordinate directly when creating the dataset:\n", " # coords=dict(position=(dimensions_pos, values_pos, attrs_pos)),\n", " attrs=dataset_attrs,\n", ")\n", "\n", "# Promote the \"position\" variable to a coordinate:\n", "# In general, most of the functions that modify the structure of the xarray dataset will\n", "# return a new object, hence the assignment\n", "dataset = dataset.set_coords([\"position\"])\n", "dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "8e5f2caa", "metadata": {}, "outputs": [], "source": [ "dataset.coords[\"position\"]" ] }, { "cell_type": "markdown", "id": "1804e77a", "metadata": {}, "source": [ "Note that the xarray coordinates are available as variables as well:" ] }, { "cell_type": "code", "execution_count": null, "id": "a4a2851c", "metadata": {}, "outputs": [], "source": [ "dataset.variables[\"position\"]" ] }, { "cell_type": "markdown", "id": "8b12f767", "metadata": {}, "source": [ "Which, on its own, might not be very useful yet, however, xarray coordinates can be set\n", "to **index** other variables ({func}`~quantify_core.data.handling.to_gridded_dataset`\n", "does this for the Quantify dataset), as shown below (note the bold font in the output!):" ] }, { "cell_type": "code", "execution_count": null, "id": "f58a7fa2", "metadata": {}, "outputs": [], "source": [ "dataset = dataset.set_index({\"position_x\": \"position\"})\n", "dataset.position_x.attrs[\"unit\"] = \"m\"\n", "dataset.position_x.attrs[\"long_name\"] = \"Position x\"\n", "dataset" ] }, { "cell_type": "markdown", "id": "b74e4c79", "metadata": {}, "source": [ "At this point the reader might get very confused. In an attempt to clarify, we now have\n", "a dimension, a coordinate and a variable with the same name `\"position_x\"`." ] }, { "cell_type": "code", "execution_count": null, "id": "e8dcb228", "metadata": {}, "outputs": [], "source": [ "(\n", " \"position_x\" in dataset.dims,\n", " \"position_x\" in dataset.coords,\n", " \"position_x\" in dataset.variables,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "8c96c7c1", "metadata": {}, "outputs": [], "source": [ "dataset.dims[\"position_x\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "1b38f2df", "metadata": {}, "outputs": [], "source": [ "dataset.coords[\"position_x\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "bdc518fc", "metadata": {}, "outputs": [], "source": [ "dataset.variables[\"position_x\"]" ] }, { "cell_type": "markdown", "id": "d6b85f15", "metadata": {}, "source": [ "Here the intention is to make the reader aware of this peculiar behavior.\n", "Please consult the {doc}`xarray documentation ` for more details.\n", "\n", "An example of how this can be useful is to retrieve data from an xarray variable using\n", "one of its coordinates to select the desired entries:" ] }, { "cell_type": "code", "execution_count": null, "id": "e19a7914", "metadata": {}, "outputs": [], "source": [ "dataset.velocity" ] }, { "cell_type": "code", "execution_count": null, "id": "50f7acd8", "metadata": {}, "outputs": [], "source": [ "retrieved_value = dataset.velocity.sel(position_x=2.5)\n", "retrieved_value" ] }, { "cell_type": "markdown", "id": "550fee7f", "metadata": {}, "source": [ "Note that without this feature we would have to keep track of numpy integer indexes to\n", "retrieve the desired data:" ] }, { "cell_type": "code", "execution_count": null, "id": "afa78109", "metadata": {}, "outputs": [], "source": [ "dataset.velocity.values[3], retrieved_value.values == dataset.velocity.values[3]" ] }, { "cell_type": "markdown", "id": "ca6a51d7", "metadata": {}, "source": [ "One of the great features of xarray is automatic plotting (explore the xarray\n", "documentation for more advanced capabilities!):" ] }, { "cell_type": "code", "execution_count": null, "id": "5d37260f", "metadata": {}, "outputs": [], "source": [ "_ = dataset.velocity.plot(marker=\"o\")" ] }, { "cell_type": "markdown", "id": "23aa150d", "metadata": {}, "source": [ "Note the automatic labels and unit." ] } ], "metadata": { "file_format": "mystnb", "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst" } }, "kernelspec": { "display_name": "python3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }